207 research outputs found

    Red Panda: A Novel Method for Detecting Variation in Single-Cell RNA Sequencing

    Get PDF
    Single-cell sequencing enables the rapid acquisition of genomic and transcriptomic data from individual cells to better understand genetic diseases, such as cancer or autoimmune disorders, which are often affected by changes in rare cells. Currently, no existing software is aimed at identifying single nucleotide variations or micro (1-50bp) insertions and deletions in single-cell RNA sequencing (scRNA-seq) data. However, generating high quality data is vital to the study of the aforementioned diseases, among others. Our goal is to create such a tool and use in-house sequencing to validate its effectiveness. Our software employs the unique information found in scRNA-seq data to more accurately identify variants in ways not possible with software designed for bulk sequencing. We intentionally isolate variants based on three different classes: homozygous-looking, heterozygous, and bimodally-distributed heterozygous, the last of which can only be identified in scRNA-seq. To properly validate the results from this method, variants were called on: scRNA-seq and exome sequencing jointly performed on human articular chondrocytes, scRNA-seq from mouse embryonic fibroblasts (MEFs), and simulated data stemming from the MEF alignments. The chondrocyte exome sequencing was used to validate the chondrocyte scRNA-seq results. For Red Panda, on average, 913 variants were shared with the exome and had a Positive Predictive Value (PPV) of 45.0%. Other tools—FreeBayes, GATK HaplotypeCaller, GATK UnifiedGenotyper, and Platypus—ranged from 65-705 variants and 5.8%-31.7% PPV. Sanger sequencing was performed on a subset of the variants identified in the MEFs, and simulated data was generated to assess the sensitivity of each tools. From the latter, Red Panda had the highest sensitivity at 72.44%. The other tools ranged from 18.22% to 39.09%. We show that our method provides a novel and improved mechanism to identify variants in scRNA-seq as compared to currently-existing software

    Exome Screening to Identify Loss-of-Function Mutations in the Rhesus Macaque for Development of Preclinical Models of Human Disease

    Get PDF
    BACKGROUND: Exome sequencing has been utilized to identify genetic variants associated with disease in humans. Identification of loss-of-function mutations with exome sequencing in rhesus macaques (Macaca mulatta) could lead to valuable animal models of genetic disease. Attempts have been made to identify variants in rhesus macaques by aligning exome data against the rheMac2 draft genome. However, such efforts have been impaired due to the incompleteness and annotation errors associated with rheMac2. We wished to determine whether aligning exome reads against our new, improved rhesus genome, MacaM, could be used to identify high impact, loss-of-function mutations in rhesus macaques that would be relevant to human disease. RESULTS: We compared alignments of exome reads from four rhesus macaques, the reference animal and three unrelated animals, against rheMac2 and MacaM. Substantially more reads aligned against MacaM than rheMac2. We followed the Broad Institute\u27s Best Practice guidelines for variant discovery which utilizes the Genome Analysis Toolkit to identify high impact mutations. When rheMac2 was used as the reference genome, a large number of apparent false positives were identified. When MacaM was used as the reference genome, the number of false positives was greatly reduced. After examining the variant analyses conducted with MacaM as reference genome, we identified two putative loss-of-function mutations, in the heterozygous state, in genes related to human health. Sanger sequencing confirmed the presence of these mutations. We followed the transmission of one of these mutations (in the butyrylthiocholine gene) through three generations of rhesus macaques. Further, we demonstrated a functional decrease in butyrylthiocholinesterase activity similar to that observed in human heterozygotes with loss-of-function mutations in the same gene. CONCLUSIONS: The new MacaM genome can be effectively utilized to identify loss-of-function mutations in rhesus macaques without generating a high level of false positives. In some cases, heterozygotes may be immediately useful as models of human disease. For diseases where homozygous mutants are needed, directed breeding of loss-of-function heterozygous animals could be used to create rhesus macaque models of human genetic disease. The approach we describe here could be applied to other mammals, but only if their genomes have been improved beyond draft status

    Subclinical infection of macaques and baboons with a baboon simarterivirus

    Get PDF
    Simarteriviruses (Arteriviridae: Simarterivirinae) are commonly found at high titers in the blood of African monkeys but do not cause overt disease in these hosts. In contrast, simarteriviruses cause severe disease in Asian macaques upon accidental or experimental transmission. Here, we sought to better understand the host-dependent drivers of simarterivirus pathogenesis by infecting olive baboons (n = 4) and rhesus monkeys (n = 4) with the simarterivirus Southwest baboon virus 1 (SWBV-1). Surprisingly, none of the animals in our study showed signs of disease following SWBV-1 inoculation. Three animals (two rhesus monkeys and one olive baboon) became infected and sustained high levels of SWBV-1 viremia for the duration of the study. The course of SWBV-1 infection was highly predictable: plasma viremia peaked between 1 × 107 and 1 × 108 vRNA copies/mL at 3–10 days post-inoculation, which was followed by a relative nadir and then establishment of a stable set-point between 1 × 106 and 1 × 107 vRNA copies/mL for the remainder of the study (56 days). We characterized cellular and antibody responses to SWBV-1 infection in these animals, demonstrating that macaques and baboons mount similar responses to SWBV-1 infection, yet these responses are ineffective at clearing SWBV-1 infection. SWBV-1 sequencing revealed the accumulation of non-synonymous mutations in a region of the genome that corresponds to an immunodominant epitope in the simarterivirus major envelope glycoprotein GP5, which likely contribute to viral persistence by enabling escape from host antibodies

    Le rôle des anaphores dans la mise en place des relations de cohérence dans le discours: l'hypothèse de J.R. Hobbs

    Get PDF
    International audiencePour Hobbs (1979), l'interprétation des anaphoriques inter-phrastiques 'découle' ('falls out') en tant qu'effet secondaire de l'emploi d'une relation de cohérence pour intégrer deux unités de discours. En essayant de mettre au clair la nature des interactions entre relations de cohérence et fonctionnement des anaphores dans la compréhension de textes multi-propositionnels, l'article cherche à démontrer que l'hypothèse de Hobbs n'est valable qu'en partie. D'abord, la référence complète des anaphoriques ne 'découle' pas seulement du choix de telle ou telle relation de cohérence pour intégrer deux unités de discours: elle est essentielle pour la mise en œuvre effective de cette relation. Ensuite, elle sert également à sélectionner l'unité de discours avec laquelle l'unité arrivante va s'intégrer, ainsi qu'à articuler le discours en une structure hiérarchique. Enfin l'intégration des unités de discours n'a pas lieu d'un seul coup, mais peut être étalée sur au moins trois étapes distinctes

    Red Panda: A Novel Method for Detecting Variants in Single-Cell RNA Sequencing

    Get PDF
    BACKGROUND: Single-cell sequencing enables us to better understand genetic diseases, such as cancer or autoimmune disorders, which are often affected by changes in rare cells. Currently, no existing software is aimed at identifying single nucleotide variations or micro (1-50 bp) insertions and deletions in single-cell RNA sequencing (scRNA-seq) data. Generating high-quality variant data is vital to the study of the aforementioned diseases, among others. RESULTS: In this study, we report the design and implementation of Red Panda, a novel method to accurately identify variants in scRNA-seq data. Variants were called on scRNA-seq data from human articular chondrocytes, mouse embryonic fibroblasts (MEFs), and simulated data stemming from the MEF alignments. Red Panda had the highest Positive Predictive Value at 45.0%, while other tools-FreeBayes, GATK HaplotypeCaller, GATK UnifiedGenotyper, Monovar, and Platypus-ranged from 5.8-41.53%. From the simulated data, Red Panda had the highest sensitivity at 72.44%. CONCLUSIONS: We show that our method provides a novel and improved mechanism to identify variants in scRNA-seq as compared to currently existing software. However, methods for identification of genomic variants using scRNA-seq data can be still improved

    Multi-Messenger Gravitational Wave Searches with Pulsar Timing Arrays: Application to 3C66B Using the NANOGrav 11-year Data Set

    Get PDF
    When galaxies merge, the supermassive black holes in their centers may form binaries and, during the process of merger, emit low-frequency gravitational radiation in the process. In this paper we consider the galaxy 3C66B, which was used as the target of the first multi-messenger search for gravitational waves. Due to the observed periodicities present in the photometric and astrometric data of the source of the source, it has been theorized to contain a supermassive black hole binary. Its apparent 1.05-year orbital period would place the gravitational wave emission directly in the pulsar timing band. Since the first pulsar timing array study of 3C66B, revised models of the source have been published, and timing array sensitivities and techniques have improved dramatically. With these advances, we further constrain the chirp mass of the potential supermassive black hole binary in 3C66B to less than (1.65±0.02)×109 M(1.65\pm0.02) \times 10^9~{M_\odot} using data from the NANOGrav 11-year data set. This upper limit provides a factor of 1.6 improvement over previous limits, and a factor of 4.3 over the first search done. Nevertheless, the most recent orbital model for the source is still consistent with our limit from pulsar timing array data. In addition, we are able to quantify the improvement made by the inclusion of source properties gleaned from electromagnetic data to `blind' pulsar timing array searches. With these methods, it is apparent that it is not necessary to obtain exact a priori knowledge of the period of a binary to gain meaningful astrophysical inferences.Comment: 14 pages, 6 figures. Accepted by Ap

    A New Rhesus Macaque Assembly and Annotation for Next-Generation Sequencing Analyses

    Get PDF
    BACKGROUND: The rhesus macaque (Macaca mulatta) is a key species for advancing biomedical research. Like all draft mammalian genomes, the draft rhesus assembly (rheMac2) has gaps, sequencing errors and misassemblies that have prevented automated annotation pipelines from functioning correctly. Another rhesus macaque assembly, CR_1.0, is also available but is substantially more fragmented than rheMac2 with smaller contigs and scaffolds. Annotations for these two assemblies are limited in completeness and accuracy. High quality assembly and annotation files are required for a wide range of studies including expression, genetic and evolutionary analyses. RESULTS: We report a new de novo assembly of the rhesus macaque genome (MacaM) that incorporates both the original Sanger sequences used to assemble rheMac2 and new Illumina sequences from the same animal. MacaM has a weighted average (N50) contig size of 64 kilobases, more than twice the size of the rheMac2 assembly and almost five times the size of the CR_1.0 assembly. The MacaM chromosome assembly incorporates information from previously unutilized mapping data and preliminary annotation of scaffolds. Independent assessment of the assemblies using Ion Torrent read alignments indicates that MacaM is more complete and accurate than rheMac2 and CR_1.0. We assembled messenger RNA sequences from several rhesus tissues into transcripts which allowed us to identify a total of 11,712 complete proteins representing 9,524 distinct genes. Using a combination of our assembled rhesus macaque transcripts and human transcripts, we annotated 18,757 transcripts and 16,050 genes with complete coding sequences in the MacaM assembly. Further, we demonstrate that the new annotations provide greatly improved accuracy as compared to the current annotations of rheMac2. Finally, we show that the MacaM genome provides an accurate resource for alignment of reads produced by RNA sequence expression studies. CONCLUSIONS: The MacaM assembly and annotation files provide a substantially more complete and accurate representation of the rhesus macaque genome than rheMac2 or CR_1.0 and will serve as an important resource for investigators conducting next-generation sequencing studies with nonhuman primates. REVIEWERS: This article was reviewed by Dr. Lutz Walter, Dr. Soojin Yi and Dr. Kateryna Makova

    The NANOGrav 11-year Data Set: High-precision Timing of 45 Millisecond Pulsars

    Get PDF
    We present high-precision timing data over time spans of up to 11 years for 45 millisecond pulsars observed as part of the North American Nanohertz Observatory for Gravitational Waves (NANOGrav) project, aimed at detecting and characterizing low-frequency gravitational waves. The pulsars were observed with the Arecibo Observatory and/or the Green Bank Telescope at frequencies ranging from 327 MHz to 2.3 GHz. Most pulsars were observed with approximately monthly cadence, and six high-timing-precision pulsars were observed weekly. All were observed at widely separated frequencies at each observing epoch in order to fit for time-variable dispersion delays. We describe our methods for data processing, time-of-arrival (TOA) calculation, and the implementation of a new, automated method for removing outlier TOAs. We fit a timing model for each pulsar that includes spin, astrometric, and (for binary pulsars) orbital parameters; time-variable dispersion delays; and parameters that quantify pulse-profile evolution with frequency. The timing solutions provide three new parallax measurements, two new Shapiro delay measurements, and two new measurements of significant orbital-period variations. We fit models that characterize sources of noise for each pulsar. We find that 11 pulsars show significant red noise, with generally smaller spectral indices than typically measured for non-recycled pulsars, possibly suggesting a different origin. A companion paper uses these data to constrain the strength of the gravitational-wave background

    The nanograv nine-year data set: Observations arrival time measurements and analysis of 37 millisecond pulsars

    Get PDF
    We present high-precision timing observations spanning up to nine years for 37 millisecond pulsars monitored with the Green Bank and Arecibo radio telescopes as part of the North American Nanohertz Observatory for Gravitational Waves (NANOGrav) project. We describe the observational and instrumental setups used to collect the data, and methodology applied for calculating pulse times of arrival; these include novel methods for measuring instrumental offsets and characterizing low signal-to-noise ratio timing results. The time of arrival data are fit to a physical timing model for each source, including terms that characterize time-variable dispersion measure and frequency-dependent pulse shape evolution. In conjunction with the timing model fit, we have performed a Bayesian analysis of a parameterized timing noise model for each source, and detect evidence for excess low-frequency, or \ red,\ timing noise in 10 of the pulsars. For 5 of these cases this is likely due to interstellar medium propagation effects rather than intrisic spin variations. Subsequent papers in this series will present further analysis of this data set aimed at detecting or limiting the presence of nanohertz-frequency gravitational wave signals
    corecore